Joyful Debugging with Jupyter Notebooks

Hi 👋 I'm Robin.

🐣 <code>@funktoriell</code>

absolventa.png

We run job boards!

job_board.png

Let's talk about debugging! 🤠

Crafting, verifying, denying, accepting hypotheses about problems with your app 🔬

Running, observing, repeating code to observe patterns or spot and understand problems 🔦

local development environment vs. production environment

simulation costs vs. real operation risks 💇🏻‍♀️

Sometimes there is practical business need to work with real production data 📀

Running a Rails application?

heroku run console --app scumm-bar-production

Spawning ssh-REPL-sessions all the day

Issues with ssh-REPL-only-production-data-debugging (mine)

Problem 1: Singular point of knowledge.

  • 1 dev, 1 machine
  • no (async) knowledge sharing
  • output hell 🙈 urggh2.png

Without explictly logging/copying it, there is almost no chance to

  • re-think the process more carefully later on
  • re-cap the computed results under that environment later on
  • repeat exactly (or similar) steps easily
  • catch-up for your team mates

Problem 2: Potentially stressful setup.

Being on a ssh console session with full write access to the production database can be harmful to your app, but at least it's a stressful setup for you as a developer 🙀

Problem 3: Violation of engineering best practices

  • endlessly repeating yourself (session closed, all code is gone, new session […])
  • using crazy ad-hoc solutions even you know how to solve problems better (and forgetting what has been done a few hours later)
  • invites you to throw "good programming habits" away

I like debugging 💚

But I felt bad for doing this way.

🔥 Hot take 🔥

Debugging is, at its core, deeply entangled with

Exploratory Data Analysis

We should not feel bad for doing EDA.

We should embrace, foster and harness it

for our development process! 🔧

Why not borrow tooling from Data Science universe? 🤨

The notion of a notebook: An "annotated" REPL

Bildschirmfoto%202019-12-05%20um%2017.43.44.png

jupyter-example.png

1980's Computational Notebooks

  • Mathematica
  • Maple
  • Matlab

Last decade: Open Source Notebook Tooling is a game changer for industry + academia

rstudio.png jupy2.png

A notebook is a sequence of cells and an interactive document

cells.png

In a nutshell 🐿

Jupyter Notebook = REPL + Markdown notes + Visualizations

This presentation is a notebook too! 🤓

$$\int K dvolg$$
In [3]:
[1,2,3].each { |n| puts n*n }
1
4
9
Out[3]:
[1, 2, 3]

Under the hood, a notebook is a JSON document

  • jupyter spawns a web server
  • Communication using JSON with AMQP-ish system json_example.png

Jupyter notebook has a kernel loaded

(a programming language process that interprets the code cells and produces output cells)

Default Kernels: Julia, Python, R

There's also a Ruby kernel! ☀️

sciruby.png sciruby2.png

Starting a notebook is easy, just run

$ jupyter-notebook

It starts a local web server serving the "notebook app" (let's do it a little later on!)

Extremely useful: https://github.com/SciRuby/daru

panda.png

"Pandas for Ruby"

In [4]:
require "daru"

df = Daru::DataFrame.new(
  island: ["Mêlée_Island", "Plunder Island", "Monkey Island"],
  monkey_population: [776, 166, 1329],
  size: [1, 10, 3]
)
Out[4]:
Daru::DataFrame(3x3)
island monkey_population size
0 Mêlée_Island 776 1
1 Plunder Island 166 10
2 Monkey Island 1329 3
In [6]:
# Choose columns
df[:island]
Out[6]:
Daru::Vector(3)
island
0 Mêlée_Island
1 Plunder Island
2 Monkey Island
In [7]:
# Sophisticated filtering
df.where(df[:size] > 1)
df.where(df[:island].eq("Plunder Island") | (df[:monkey_population] < 1000))
Out[7]:
Daru::DataFrame(2x3)
island monkey_population size
0 Mêlée_Island 776 1
1 Plunder Island 166 10
In [8]:
owners = Daru::DataFrame.new(
  name: ["Bob", "Alice", "Julie"], pet_id: [1, 2, 2]
)
pets = Daru::DataFrame.new(
  name: ["Suzy", "Manfred", "George"], 
  pet_id: [1, 2, 3], 
  type: ["cat", "dinosaur", "dog"]
)
Out[8]:
Daru::DataFrame(3x3)
name pet_id type
0 Suzy 1 cat
1 Manfred 2 dinosaur
2 George 3 dog
In [1]:
# You can join them together!

What about Rails?

draisine.png

iruby-rails.png jupyter_on_rails.png

Jupyter On Rails adds ActiveRecord extension for Daru

data_frame.png

Jupyter On Rails adds rake task for packaging your Rails app into a Ruby kernel

$ rake jupyter:notebook

You can use all your models etc in your notebook ☀️

✨ Visualization ✨

Visualization: R has ggplot2 😍

Bildschirmfoto%202019-12-03%20um%2016.27.24.png

Python has nice great visualization ecosystem, too!

Bildschirmfoto%202019-12-04%20um%2008.46.43.png

(taken from "altair")

Some visualization libraries - Ruby ecosystem

☝️Possibly there are more

Chartkick

chartkick.png chartkick%20Kopie.png

We've made a little wrapper gem to combine it with iruby notebooks

iruby-chartkick.png

In [13]:
require "iruby/chartkick"
include IRuby::Chartkick
data = JobOffer.group_by_day(:created_at).size
bar_chart(data, points: false, max: 5)
Out[13]:
Loading...

Some possible traps

  • Notebooks carry a hidden state ☝️
  • Replicability is better, but only on the surface 🤨
  • Sometimes bad for teaching: Students are mindlessly executing stuff they do not understand

Recap: Did my debugging cycles improve by that?

  • Persistence + reproducibility of problem descriptions: Huge improvement for me 👍👍👍
  • In contrast to caveats above: Notebooks encourage me to extend our internal library ecosystem with (tested) abstractions exclusively for the notebooks 👍
  • Far more joy by visualizations and data frames 👍
  • I've spotted bugs only by making "arbitrary" visualizations for fun 👍👍👍!
  • Solution for the "production data" problem: Connecting to production replika database from Jupyter notebook with READONLY access 👍

jobpyter.png jobpyter2.png

Thanks for your time!